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Optimal Bit Allocation for Error Resilient Video Transcoding 
Field of the Invention 

[01] This invention relates generally to transcoding videos, and more particularly 
to dynamically allocating bits according to rate and distortion characteristics while 
transcoding videos. 

Background of the Invention 

[02] Transmitting a video bitstream through wireless channels is a challenging 
problem due to limitations in bandwidth and a noisy channel. If a video is originally 
coded at a bit rate greater than an available bandwidth in a wireless channel, then the 
videos must first be transcoded to a lower bit rate, prior to transmission. Because a 
noisy channel can easily corrupt a quality of the video, there is also a need to make 
the encoded video bitstream resilient to transmission errors, even though the overall 
number of bits allocated to the bitstream is reduced. 

[03] Two primary methods used for error-resilience video encoding are 
^synchronization marker insertion and intra-block insertion (intra-refresh). Both 
methods are effective at localizing errors. If the errors are localized, then recovery 
from errors is facilitated. 

[04] Resynchronization inserts periodic markers so that when an error occurs, 
decoding can be restarted at a point where the last resynchronization marker was 
inserted. In this way, errors are spatially localized. There are two basic approaches 
for inserting synchronization markers: a group-of-block (GOB) based approach, 
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which is adopted in the H.261/H.263 standard, and a packet-based approach, which 
is adopted in the MPEG-4 standard. 

[05] In the GOB-based approach, a GOB header is inserted periodically after a 
certain number of macroblocks (MBs). In the packet-based approach, header 
information is placed at the start of each packet. Because the way the packets are 
formed is based on the number of bits, the packet-based approach is generally more 
uniform than the GOB-based approach. 

[06] While resynchronization marker insertion is suitable to provide a spatial 
localization of errors, the insertion of intra MBs is used to provide a temporal 
localization of errors by decreasing the temporal dependency in the encoded video 
bitstream. 

[07] A number of error resilience video encoding methods are known. In "Error- 
resilient transcoding for video over wireless channels," IEEE Journal on Selected 
Areas in Communications," vol. 18, no. 6, pp. 1063-1074, 2000 by Reyes, et al., 
optimal bit allocation between error resilience insertion and video encoding is 
achieved by modeling the rate-distortion of error propagation due to channel 
errors. However, that method assumes that the actual rate-distortion characteristics 
of the video are known, which makes the optimization difficult to realize 
practically. Also, that method does not consider the impact of error concealment. 

[08] In "Optimal mode selection and synchronization for robust video 
communications over error-prone networks," IEEE Journal on Selected Areas in 
Communications, vol. 18, no. 6, pp. 952-965, 2000 by Cote, et al., the optimal 
error resilience insertion problem is divided into two sub-problems: optimal mode 
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selection for MBs; and optimal resynchronization marker insertion. That 
optimization is conducted on an MB basis and inter-frame dependency is not 
considered. 

[09] Another method described by Zhang, et al., "Video coding with optimal 
inter/intra-mode switching for packet loss resilience," IEEE Journal on Selected 
Areas in Communications, vol. 18, no. 6, pp. 966-976, 2000, determines 
recursively a total decoder distortion with pixel-level precision to account for 
spatial and temporal error propagation in a packet loss environment. That method 
attempts to select an optimal MB encoding mode. That method is quite accurate on 
the MB level when compared with other methods. However, that method does not 
consider the inter-frame dependency and the optimization is only conducted on the 
current MB. 

[010] Dogan, et al. describe a video transcoding framework for general packet 
radio service (GPRS) in "Error-resilient video transcoding for robust inter-network 
communications using GPRS," IEEE Transactions on Circuits and Systems for 
Video Technology, vol. 12, no. 6, pp. 453-464, 2002. However, the bit allocation 
between inserted error resilience and the video encoding is not optimized in that 
method. 

[011] For video distortion caused by channel errors, a low complexity video 
quality model has been described by Reibman et al., in "Low-complexity quality 
monitoring of MPEG-2 video in a network," in Proceedings IEEE International 
Conference on Image Processing, September 2003. However, the measurement to 
determine error propagation effects is only based on the received bitstream. One of 
the most important aspects that is not fully considered by that method is the issue 
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of inter-frame dependency, which is a key factor in motion compensated video 
encoding. Often, bit allocation and encoding mode selection are optimized only for 
the current MB or the current frame. 

[012] It is desired to provide an optimal solution that reduces the video bit rate 
while maintaining error resilience. It is also desirable to have models that account 
for inter-frame dependency, which is inherit to many coding schemes, and also 
accurately account for the propagation of errors at the receiver. This is especially 
important when a video bit stream is transferred from a channel with a high 
bandwidth and a low bit-error-rate (BER), for example, a wired channel, to a 
channel with a low bandwidth and a high BER, for example, a wireless channel. 
For such a low bandwidth channel, the combined task of bit rate reduction and 
error resilience insertion is essential because the bit rate reduction needs to be 
balanced against the additional error resilience bits. 

Summary of the Invention 

[013] The invention provides for transcoding a video for transmission in an error- 
prone channel. The invention optimizes the allocation of bits used for the video 
source with bits for error resilience such that an end-to-end distortion is minimized 
under a given rate constraint and a given channel condition. 

[014] The bit rate for the video is reduced by requantization, while the bits for error- 
resilience are controlled by inserting ^synchronization markers and intra-coded 
blocks. 
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[015] The invention makes use of rate-distortion (R-D) models for requantizating 
the video based on inter-frame dependencies, as well as R-D models for error 
propagation in a motion compensated video. Based on these models, the invention 
uses a dynamic and optimal bit allocation scheme. 

[016] To account for the inter-frame dependencies, the bit allocation scheme 
operates on a group-of-pictures (GOP). The optimal allocation scheme achieves 
better PSNR than fixed bit allocation schemes of the prior art. 

[017] The invention also provides an alternative allocation scheme that achieves 
similar performance as the optimal scheme, but with a much lower complexity. 

Brief Description of the Drawings 

[018] Figure 1 is a block diagram of rate-distortion models and a transcoding 
method according to the invention; 

[019] Figure 2 is a block diagram of a video transcoder according to the invention; 

[020] Figure 3 is a block diagram of a video system according to the invention; 

[021] Figure 4 is a block diagram of a spatial concealment method used by the 
invention; 

[022] Figures 5 and 6 are block diagrams of decomposing distortion for I- and P- 
frames of a video caused by channel errors; 
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[023] Figure 7 is a graph comparing ^synchronization marker insertion accuracy; 
and 

[024] Figure 8 is a graph comparing intra-block insertion accuracy. 
Detailed Description of the Preferred Embodiment 

[025] As shown in Figure 1, the invention provides a method for transcoding 100 an 
input video bitstream 101 so that a bit rate in an output bitstream 102 is reduced 
while maintaining error resilience under a given bit rate constraint and channel 
condition. The method 100 subjects to input video to three rate-distortion (RD) 
models: a video source requantization model 1 1 1, a ^synchronization marker model 
1 12, and an intra-block refresh model 113. The outputs of the three models are input 
to a bit allocation control module 120, which determines a quantization parameter 
121, an intra-block refresh rate 122 and a resynchronization marker rate 123. These 
parameters are used by a transcoder 130 to form the output bitstream 102. 

[026] The three models are novel in that inter-frame dependency is included in both 
a video source model and an error resilience model. In addition, the error resilience 
model in the transcoding considers error concealment at the receiver. 

[027] The invention also provides an alternative embodiment of the transcoding 
method that achieves near-optimal performance at a lower complexity. 
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[028] Transcoder Structure 

[029] Figure 2 shows a transcoder 200 according to the invention. The transcoder 
includes a decoder 210 and an encoder 220. The decoder 210 takes an input video 
bitstream 101 at a first bit rate. The encoder produces an output bitstream 102 at a 
second bit rate. In a typical application, the second bit rate is less than the first bit 
rate. 

[030] The decoder 210 includes a variable length decoder (VLD) 211, a first inverse 
quantizer (Q" 1 1 ) 21 2, an inverse discrete cosine transform (IDCT) 2 1 3 , a motion 
compensation (MC) block 214, and a first frame store 215. 

[031] The encoder 220 includes a variable length coder (VLC) 221, a quantizer (Q 2 ) 
222, a discrete cosine transform 223, a motion compensation (MC) block 224, and a 
second frame store 225. The transcoder also includes a second inverse quantizer (Q" 
\) 226 and a second IDCT 227. 

[032] In addition, the encoder includes an intra/inter switch 228 and a 
resynchronization marker insertion block 229 . 

[033] The bit allocation 120 of Figure 1 provides the quantization parameter 121 to 
the quantizer 222, the resynchronization marker rate 122 to the resynchronization 
marker insertion block 229 and the intra-block refresh rate 123 to the intra/inter 
switch 228. 
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[034] Problem Statement 

[035] It is an object of the invention to minimize an end-to-end distortion of the 
encoded video bitstream subject to rate constraints. An overall rate budget is 
allocated among the three different components that contribute to the rate, i.e., video 
source requantization, resynchronization marker insertion, and intra-refresh. 

[036] To achieve this object the three distinct components, the video source 
requantization model, the intra-refresh model, and the resynchronization marker 
insertion model are described. The later two model error-resilience. Although there 
is some degree of dependency among these three components, each component has 
a unique impact on the R-D characteristics of the transcoded video under different 
channel conditions. 

[037] The video source model accounts for the R-D characteristics of the video 
bitstream without resynchronization markers or intra-refresh insertion, while the 
error-resilience models accounts for the R-D characteristics of intra-block insertion 
and resynchronization marker insertion. 

[038] Although the separation of the error resilience model from the video source 
model is an approximation, it turns out to be quite accurate for the R-D optimized 
bit allocation scheme according to the invention. 

[039] The problem is formally stated as follows. A target bit rate constraint is R T . A 
total distortion is D, which is measured as a mean squared error (MSE). Given these 
parameters, it is desired to minimize the distortion, subject to the target rate 
constraint, i.e., to solve 



8 



MERL-1544 
Vetroetal. 

K 

min D = ^2d k (u> k ) 

k=l 

K 

subject to ^^/c(<^k) < Rt, 

k=i (1) 

[040] where d k is the distortion caused by each of the three components k e K for 

k = 1, 2, 3, r k is the rate of each component and co k are the specific parameters used 

in the allocation, e.g., quantization parameters, resynchronization marker spacing, 

and intra refresh rate. 



[041] One way to solve the above problem is through a Lagrangian optimization 
approach in which the following quantity is minimized: 

K K 

^ dk (u>k) + A ^2 r * ( w k), 
k=i k=i 

[042] where A is the Lagrangian multiplier to be determined during the optimization. 
A bisection process can be used to obtain the optimal multiplier used to solve this 
problem. However, that process is iterative and computationally expensive. Also, 
obtaining accurate R-D sample points required by the optimization procedure is still 
an open issue. 



[043] It is preferred to use a distinct R-D model for each of the three components so 
that the optimization does not have to obtain the actual R-D values from simulation. 
With these models, some of the computational burden for solving the above problem 
is alleviated. However, this solution is relatively complex. Therefore, an alternative 
method that can solve the bit allocation problem with similar performance, but with 
a much lower complexity, is sought and described as part of this invention. 
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[044] Video Source Requantization Model 

[045] Our R-D model for a coded video source operates on groups-of-frames 
(GOP). This accounts for inter- frame dependency by considering the requantization 
distortion in the current frame that propagates to the next frame through motion 
compensation. The R-D model is then modified accordingly for the next frame to 
account for this error propagation effect. 

[046] If a composite signal, such as the output video 102 is decomposed into 
independent components, i.e., the requantized video, the resynchronization markers, 
and the intra-refresh blocks, then a composite R-D model can be derived directly 
from the three individual R-D models. Furthermore, if the signal can be decomposed 
into independent identically distributed (i.i.d.) Gaussian sources with energy 
compact transforms, such as the DCT, then the total distortion D of the signal 
caused by the encoding can be modeled as: 

D=(R*(uH)] l/L -e-M D \ 

i=o (3) 

[047] where L is the total number of frequency coefficients in the case of DCT, 

<J>(ey,) is the power spectrum density function of coefficient i, R is the bit rate of the 

signal, and a constant parameter fi is 21n2. An interesting observation from this 

result is that the exponential function of rate is proportional to the product of the 

coefficient variances rather than the sum of variances. 

[048] The above model is only accurate for Gaussian sources with fine quantization. 
It is known that a video source can be characterized more accurately by a 
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generalized Gaussian model. Furthermore, a video source often needs to go through 
coarse requantization during transcoding to adapt to lower bandwidth constraints. 

[049] The following modifications are made to the model to accommodate these 
two issues. First, the parameter ft is made variable, rather than a fixed value, and 
second, R(D) is replaced by R y (D). 

[050] Furthermore, if the value lTli=o 1/1 is replaced by <7 2 , the total 

variance of the signal, then 

D = o 2 e-e Ry < D \ (4) 

[051] Experimental data indicate that 0 is usually in the range of [1, 10], and y is in 
the range of [0, 1]. Then, for requantizing intra-coded frames, the distortion is 

D 0 = 4e-^\ (5) 

[052] where D 0 is the distortion of the intra-coded frame caused by requantization, 
and R 0 is the rate. The intra-coded variance, <j 2 0 , can be estimated in the frequency 
domain. 

[053] It is possible to estimate the model parameters fi and y from two sample points 
on the R-D curve, as described herein. 

[054] Without considering inter-frame dependency, a similar model can be used for 
inter-coded frames: 

D k = 4e~^, * = 1,2,...,JV-1, 

iP) 

[055] where N is the total number of frames in a GOP, D k is the distortion of the 
inter-coded frame caused by requantization, R k is the rate and a 2 k is the variance of 
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the input signal. Again, the model parameters /? and y can be estimated from two 
sample points on the R-D curve. 

[056] The inter-frame dependency is modeled by changing the frame variance <? k to 

<* k 

D k = afe-P-** = (a 2 k + OfcLfc-iJe-*** , k = L 2, . . . , N - 1, (?) 

[057] where a * 2 k = & l k + ot, k D k .\ denotes the inter-frame variance, and D k .j denotes an 
extra quantization residue error produced when the previous frame is requantized 
with a larger Q-scale, and a k denotes a propagation ratio, which is determined by the 
amount of motion compensation. The term a k D k .\ models the dependency between 
the current and the previous frame. This term captures the quantization error 
propagation effect caused by motion compensation. That is, when the previous 
frame is quantized coarsely, more quantization error propagates to the current frame 
through motion compensation. 

[058] Model Parameter Estimation 

[059] Parameter estimation for the proposed R-D models is performed in two stages 
on a GOP-basis. In the first stage, all the frames in the GOP are requantized with 
multiple sample quantization scales, e.g., 4, 8, 31. For the P-frames, no motion 
compensation is performed. Using the three sample R-D points, the three parameters 
a 2 o, po, and y 0 are determined from Equation (5) that establish the model for I-frame. 
Similarly, the parameters a 2 k , fi K , and y k are estimated from Equation (6) that 
establish the model for P-frame without taking the propagation effect into account, 
i.e., the cr 2 k that is estimated here denotes the variance of the input signal. 
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[060] The second stage takes care of propagation effects in the model parameter 
estimates for the P-frames by determining a k . To do this, first requantize the I-frame 
at a different quantization scale than used in the first stage, e.g., Qj = 14. Second, 
requantize the P-frames at a different quantization scale while performing motion 
compensation to account for the propagation effects. With one sample point in a P- 
frame, the parameter a* 2 k can be estimated from Equation (7). Then, from Equation 
(7), where a * 2 k = a k + a k D kA , determine a k by: 

D k -! (8) 
[061] where D k .j is the distortion of the previous frame. 

[062] The parameters y k and a k are relatively constant within a given sequence. 
Therefore, it is sufficient to estimate these parameters only once at the start of a 
sequence, or if a scene change is detected. For parameters that are more sensitive to 
the scene content, e.g., a k and p k , their values are updated for each frame. The 
advantage of this simplification is that after y k and a k are estimated at the start, the 
transcoding only needs to be performed once to determine the model parameters, 
instead of twice. The parameter {a 2 k } is estimated from the variance of the DCT 
coefficients as expressed in Equation (4), and {fi k } is estimated from one R-D 
sample point, which is easily obtained by requantizing the current frame. 

[063] Error-Resilience R-D Models 

[064] This section describes the second and third rate-distortion models that 
improve error-resilience, i.e., ^synchronization marker insertion and intra-block 
refresh. First, a transmission environment is described, including the system 
structure, type of channel, and methods of error concealment. Then, the distortion 
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models for resynchronization and intra-block insertion (intra-refresh) are described. 
Here, the focus is on the distortion models, because the rate estimates are obtained 
in a rather straightforward manner. Specifically, the rate consumed by 
resynchronization markers can be determined from the number of bits in the 
resynchronization header and the resynchronization marker spacing, while the rate 
consumed by intra-refresh can be determined from the intra-refresh rate and the 
average rate increase by replacing an inter-coded MB with an intra-coded MB. 

[065] System Structure 

[066] Figure 3 shows a system 300 for transmitting and receiving a video bitstream 
via a noisy channel. Audio data 301 is generated and multiplexed with encoded 
video data 302. The data are transmitted 310 according to the H.324M standard 
defined for a typical mobile terminal, and an AL3 TransMux defined in Annex B of 
the H.223 standard. A 16-bit and an 8-bit cyclic redundancy code (CRC) are used 
for error detection in the video and audio payloads, respectively. For video 
packetization, a packet structure described in the MPEG-4 resilience tool is used. 
This structure provides resynchronization at approximately the same number of bits. 
In this way, a typical video packet has seven bytes overhead in total, including two 
bytes for control, three bytes for header, and two bytes for the CRC checksum. A 
maximum video packet payload length is 254 bytes. 

[067] A wireless channel 320 is represented according to a binary symmetric 
channel (BSC) model, which assumes independent bit error 321 in a bitstream. For 
error detection, recovery and concealment in the video receiver 330, it is assumed 
that after an error is detected, either by a CRC checksum or by a video syntax check, 
the entire video packet containing the error is discarded, and the lost MBs are 
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concealed. This is done to avoid disturbing visual effects caused by decoding 
erroneous packets. The receiver recovers the audio signal 303 and the video signal 
using a video decoder 304. 

[068] Other errors that can be detected include illegal VLC, semantic error, 
excessive DCT coefficients (> 64) in a MB, and inconsistent resynchronization 
header information, e.g., QP out of range, MBA(k) < MBA(&-1), etc. The error is 
recovered by resynchronizing to the added packet resynchronization markers or to 
the frame headers. 

[069] For error concealment, both spatial and temporal error concealment methods 
are employed, using a simple block replacement scheme. 

[070] As shown in Figure 4, a spatial concealment method is employed for a lost 
MB 401 in an intra-coded frame. The concealment is performed by copying the MB 
from its immediate upper neighbor 402. 

[071] Similarly, temporal concealment is employed for a lost MB 410 in an inter- 
coded frame. Here, the motion vector 414 of the lost MB 410 is set to be the median 
of the motion vectors selected from three specific neighbors, i.e., blocks labeled a 
41 1, b 412, and c 413 as shown in Figure 4. The MB in the previous frame 415 that 
this motion vector is referencing is copied to the current location to recover the lost 
block 410. 

[072] It is noted that the error-resilience models described in this invention also 
apply to other prior art error concealment schemes as well. 
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[073J Overall Distortion from Channel Error 

[074] Figures 5 and 6 show the decomposition of the overall distortion for I- and P- 
frames caused by channel errors. A rectangle 501 denotes the set of all the MBs in 
an I-frame, while a rectangle 601 denotes the set of all MBs in a P-frame. 

[075] For I- frames, distortion comes from lost intra-coded MBs (LS) 502, which 
are spatially concealed. For P-frames, distortion comes from two parts: distortion 
from lost MBs (L) 602, and distortion propagated from previous corrupted MBs 
through motion compensation, which are referred to as MC MBs 603. The lost MBs 
can be further decomposed into two categories: inter-coded MBs (LT) 604 lost and 
concealed with temporal concealment, and inter-coded MBs (LTC) 605 lost and 
concealed with temporal concealment, but the replacement themselves were 
corrupted. Note that LTC MBs define the intersection of L MBs and MC MBs. The 
MCC MBs 606 refer to the MBs that are received correctly, but reference the 
previous corrupted MBs through motion compensation. 

[076] If the number of MBs lost in a frame is 7/, the number of MBs corrupted 
through motion compensation is Y mc , and the total number of MBs in a frame is M, 
then the average number of corrupted MBs in a frame E[Y] can be expressed as: 



where Y tt c = Yif) Y mc . This intersection is proportional to the number of lost MBs 
and the number of inter-coded MBs corrupted through motion compensation, and 
subsequently, 



E[Y] = E[Y,] + E[Y mc ]-E[Y ltc ], 



(9) 



E[Y ltc ] = E[Y l f]Y mc ] 



E\Yj] ■ E[Y mc ] 



M 



(10) 
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[077] and the total average distortion, measured in MSE, can therefore calculated 
by: 



D = 



^{E[Y ( ]-D S } for I — frame 



. M E i Y it\ ■ D t + E[Y Uc ] ■ D tc + E[Y mee ] ■ D mc ] for P - frame, 
[078] v (11) 

[079] where D s is the average spatial concealment distortion, D t is the average 

temporal concealment distortion when copying a correct MB from the previous 

frame, D tc is the average temporal concealment when copying a corrupted MB from 

the previous frame, and D mc is the average distortion of correctly received MBs 

referencing corrupted MBs through motion compensation. The number of MCC 

MBs is Y mcc as shown in Figure 5. 

[080] Techniques to determine each quantity in the above equation are described 
below. There are two categories of quantities: distortion related to concealing lost 
MBs, and distortion related to error propagation as a result of motion compensation. 

[081] Error Concealment Distortion 



[082] The probability p t that one MB is lost in a video frame n can be modeled by 
the probability p s i that a video packet is lost. If the channel bit error rate (BER) is P, 
and an average video packet length in bits is L s , then 

Pi=Psi=\-(\-P e )L s . (12) 



[083] It follows that the average number of lost MBs E[Yi(n)] in frame n is pi -M. 
The distortion caused by losing one MB can be calculated according to one of the 
three situations: 
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the loss of an intra-coded MB that is spatially concealed resulting in 
distortion D s , 

the loss of an inter-coded MB that is temporally concealed by copying a 
non-corrupted MB from the previous frame resulting in distortion D t , and 
the loss of an inter-coded MB that is temporally concealed by copying a 
corrupted MB from the previous frame resulting in distortion D tc 

[084] The values Z> s and D t can be estimated by calculating pixel differences 
between the lost MB and the replacement MB. The value D ic can be approximated 
by an addition of motion compensation corruption to D t , e.g., D tc = D, + D mc . 

[085] Error Propagation Distortion 

[086] A Markov model can be used to estimate error propagation by motion 
compensation. The reason for using the Markov model is because the number of 
corrupted MBs in the current frame through motion compensation only depends on 
the motion vectors in the current frame and the number of corrupted MBs in the 
previous frame. The probability that a single MB is corrupted through motion 
compensation can be determined by: 

Pmc = P0l + [1 " (1 " P) 2 ]02 + [1 " (1 - P) 4 ]#3, (13) 
[087] where p is the probability of one MB being corrupted in the previous frame, 
6\ denotes the proportion of MBs in the current frame that reference a single MB, 
0 2 denotes the proportion of MBs that reference two MBs, and 0 3 denotes the 
proportion of MBs that reference four MBs in the previous frame. If the proportion 
of intra-coded MBs is denoted rj, then 6\ + 6 2 + 63 + rj = 1 . From this relation, it is 
clear that a higher value of rj yields a lower value ofp mc . 
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[088] Then, a probability transition matrix that characterizes the error propagation 
through motion compensation can be calculated by: 



P{hjmc) = P{Y m c[n) = jmc\Y{n-l) = i} = 



I \ 

M 



y Jmc i 



(14) 



[089] 

[090] where j mc is the number of MBs corrupted through motion compensation in 
frame n, i is the total number of MBs corrupted in frame n - 1. An n-step 
probability transition matrix P" is: 

P" = IIPa, 



fc=l 



where 



r{ij mc ) = P{Y mc {n) =j m JY(0) = i}. 



(15) 



(16) 



[091] P* is the 1-step Markov transition matrix for frame k. The average number of 
corrupted MBs through motion compensation in frame n can be obtained by 

P n (iJrnc) = P{Y mc (n) = j mc \Y(0) = i). (17) 

[092] where p 0 (i) is the probability of i MBs being corrupted in the first frame. 



[093] The above model is computationally complex, and is therefore simplified 
using a 1-step Markov model instead of an «-step Markov model, and use E[Y(n)] 
to replace i in Equation (14). Therefore, Equation (17) becomes 

E{Y mc }=M-p mc . (18) 



[094] It follows that the average distortion due to motion compensation at frame n 
can be expressed by 
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D mc {n)=p-{\-r,)-D{n-\), (19) 
[095] where D(n - 1) is the average distortion of frame n - 1 . 

[096] Model Accuracy 

[097] Figure 7 compares the accuracy of the R-D model for resynchronization 
marker insertion as a function of marker spacing or video packet length. The rate 
change of inserted resynchronization markers comes from the change of marker 
spacing or packet length in a range of [130, 1300] bits. The test is performed with a 
channel BER = 10" 4 . 

[098] Figure 8 shows a test of the intra-refresh R-D model as a function of intra- 
refresh rate. The intra-refresh rate varies from 2% to 90%. From these figures, it 
can be seen that the error-resilience models according to the invention predict 
accurately the actual distortion. 

[099] Bit Allocation 

[0100] Based on the above described R-D models for video source 
requantization, resynchronization marker insertion, and intra-refresh, it is now 
possible to solve the R-D optimized bit allocation problem. Then, the resulting 
optimal source R-D curve can be used in the overall bit allocation for error resilient 
coding. Based on the overall optimal bit allocation scheme, a sub-optimal scheme 
to enable transcoding with lower complexity, but achieving similar performance, is 
described. 



20 



MERL-1544 
Vetro et al. 



[0101] Optimized Rate Allocation- Source Requantization Only 



[0102] With the R-D model for video source requantization, optimal bit 
allocation 120 can be achieved for a given rate budget R. Specifically, a solution to 
the following problem is sought: 

min J^D k 

k 

subject to ^Rk < R and Rki < Rk < Rku k = Q,l,....N -1 

k (20) 

where R u and R ku are lower and upper bound of the achievable rate for the 

frame. 



[0103] For an I-frame, Ru and R^ can be determined by the minimum and 
maximum allowable quantization scale. For a P-frame k, R k i is achieved by 
assigning a minimum quantization scale to all its previous frames (0 to k-\\ and 
the maximum allowable quantization scale to the current frame. On the other hand, 
R^ is obtained by assigning a maximum allowable quantization scale to all its 
previous frames and the minimum quantization scale to the current frame. In 
practice, R ku can be estimated by coding all the MBs in the current frame with intra 
mode. 



[0104] There are several known methods to solve the above optimization 
problem, e.g., a dynamic programming approach based on the Lagrange multiplier 
and a trellis. The problem with that approach is that as the number of frames 
increases, the trellis grows exponentially and the size of the problem quickly 
becomes intractable. Another issue is that the Lagrange multiplier needs to be 
determined by traversing the trellis tree iteratively, which further complicates the 
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problem. An alternative approach incorporates a penalty function into the 
minimization problem. However, that iterative approach is relatively complex. 
Both approaches assume that the actual R-D values at various operating points are 
readily available, which may not be the case in practical applications. 

[0105J The method according to the invention is based on a projected 
Newton method, see Bertsekas, "Projected Newton methods for optimization 
problems with simple constraints," Tech. Rep. LIDS R-1025, MIT, Cambridge, 
MA, 1980, incorporated herein by reference. 

[0106] In order to use that method, the problem in Equation (20) needs to be 
modified. First, an optimal minimum distortion occurs when l> k Rk = R, i.e., the 
optimal solution always uses the entire available bit budget. Second, it is practical 
to achieve a lower bit budget, most of the time. Therefore, the rate upper bound R^, 
is exceeded rarely. Thus, the upper bound can be eliminated. Given this, the new 
constrained problem is written as: 

min J2 D k 

k 

subject to £\R| = iT and R* h > 0 k = 0,1,..., 1 

k (21) 

[0107] where the lower bound R^ is eliminated by substituting R k with R* k + 

R kh where R*= R - P k R kh 

[0108] One advantage of this method is that no additional parameters need to 
be introduced, such as a Lagrangian multiplier. The constraints are handled 
implicitly within the method by variable substitution and linear projection. 
Therefore, this method is comparable to its unconstrained counterpart. Another 



22 



MERL-1544 
Vetro et al. 

advantage of the method is that it uses Hessian information to improve the 
convergence. Therefore, the resulting Newton-like method has a typically 
superlinear rate of convergence and is considerably faster than prior art methods. 
With this method, the size of the problem can be increased considerably without 
increasing the computational time. 

[01 09] R-D Derivative Equalization 

[0110] To provide a low-complexity implementation for the bit allocation, a 
technique to determine a suboptimal operating point is described. This technique is 
basically an R-D derivative equalization scheme. This scheme is based on the fact 
that optimal bit allocation is achieved at the point where the slopes of the R-D 
function for each component are equalized, i.e., made substantially the same. 

[0111] Starting from an operation point close to an optimal point, the 
objective is to continually adjust the operating point in the direction of the optimal 
point. To achieve this, there are two steps: 

start from an operational point close to the optimal point, and 

move towards an optimal point and remain at that point, given changes in 

video content and channel conditions. 

[0112] The first step is not very difficult because the initial optimization only 
needs to be done with the first GOP. The second step uses the following R-D 
derivative equalization scheme. Specifically, examine a local derivative of each R- 
D curve and adjust the bits allocated to each component accordingly. If the rate 
budget is constant, then reallocating a change in rate AR from the component with 
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a smallest absolute derivative value to the component with a largest absolute 
derivative value is a good approximation to the optimal solution. 

[0113] Bit Allocation Strategy 

[0114] In order to evaluate the rate allocation strategy as described above, the 
following ancillary models are provided. The number of multiple transcoding 
components is N, with component i operating at bit rate R t and a distortion £>,. The 
total distortion is given by D = ^ D.(R.) , and a total rate is given by J^-R, . We 
assume that all R-D functions are convex, and 
Jons] dD > 1 dR i < 0, for all i = 1 , ... , N. 

[0116] In one interpretation of the problem, we are given an additional rate 
AR > 0. The goal is to allocate among the components so that the total distortion D 
is maximally decreased. If AR is relatively small, then the total change in 
distortion, AD, can be expressed as: 

I- — J. t, — J. 

, dDi. , , dDi , , dDj 
where — > — 1 an d ~rFT < 0 Vz = l,...,AT. 

'diifc 'diV dRi - (22) 

[0117] In the above equation, the derivative dD t I dRi is replaced by the 
highest absolute value of derivative dD k /dR k , because dD t I dR t < 0. Therefore, the 
allocation scheme that best minimizes AD, or maximizes | AD |, because AD < 0, 
allocates all the additional bits to component k. 

[0118] In a second interpretation of the problem, we decrease the total rate R 
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(23) 



[01 1 9] In the above equation, the derivative dD t I dR t is replaced by the 
lowest absolute value of derivative dDi I dRi. Therefore, the best bit allocation 
scheme that minimizes AD, decreases the rate of component / by AR. 

[0120] In a third interpretation of the problem, we reallocate bits among the 
transcoding components without increasing or decreasing the total rate. To achieve 
this, we increase the rate of some components. We denote this group with current 
operation rate R ik and distortion D ik , wherer ike [l,N]. We also decrease the rate 
of the remaining components. We denote this group with current operation rate R a , 
and distortion Da, where il e [1, TV]). The rate increase AR ik , and the rate decrease 
ARu should satisfy the three conditions below: 

(i) hRik = Ml, (ii) Y, Afla = -Afl, (iii)Afl>0, 

(24) 

[0121] where AR is the total rate adjustment. Then, the total change in 
distortion can be expressed as: 
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(25) 



[0122] From the above equation, it can be seen that the optimal bit 
reallocation scheme to minimize distortion should be the one that deducts AR only 
from the component with the smallest absolute derivative value, and adds AR only 
to the component with the largest absolute derivative value. 

[0123] An additional point that needs to be addressed here is the optimal 
value of AR. Because the value order of the derivatives dD ( I dRi for i = 1 , . . . , N 
should not change, we select the largest possible value that keeps Eqs. (22), (23) 
and (25) valid. 



[0124] This method has a lower cost than the global optimal method. The 
entire R-D curve for each encoding component is not required. In this embodiment, 
two local sample points on the R-D curve can be used to perform a discrete 
differentiation. 



[0125] Sub-Optimal Bit Allocation Procedure 



[0126] The following procedures are implemented to facilitate a low- 
complexity transcoding operation. For the first GOP of the video sequence, the 
model parameters are estimated and the R-D models for the video source 
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requantization, resynchronization marker insertion and intra-refresh are 
established. 

[0127] Then, optimal bit allocation can be achieved for this GOP through 
Lagrangian optimization process as described above. For each subsequent GOP, 
simplified parameter estimation procedures are used to generate two local 
operation points. Then, a local derivative is obtained by discrete differentiation. If 
local derivatives of the three R-D curves are equal, then the current bit allocation is 
retained. Otherwise, the bit allocation of the component with the largest absolute 
value local derivative is increased, and decrease the bit allocation of the component 
with the lowest absolute value local derivative. 

[0128] Effect of the Invention 

[0129] The invention provides rate-distortion D models that consider inter- 
frame dependency for optimal bit allocation in error resilient video transcoding. A 
sub-optimal scheme achieves similar performance with much lower complexity. 
Overall, the method according to the invention with variable bit allocation has 
superior performance compared to error-resilient transcoding schemes with fixed 
bit allocation. 

[0130] Although the invention has been described by way of examples of 
preferred embodiments, it is to be understood that various other adaptations and 
modifications may be made within the spirit and scope of the invention. Therefore, 
it is the object of the appended claims to cover all such variations and 
modifications as come within the true spirit and scope of the invention. 



27 



